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[57] 



ABSTRACT 



The language acquisition system assists the user in acquiring 
the language of an application. The system uses the dialogue 
context, a dialogue model and syntactic-semantic grammars 
to progressively build commands which, to the application 
program, are syntactically and semantically correct in the 
current context and which can be interpreted by the dialogue 
server which then controls the application program. The 
system is independent of any particular application lan- 
guage. The system is also multimodal and supports both 
speech and text input. A toolkit is provided to add this 
functionality to virtually any application program. 

37 Claims, 11 Drawing Sheets 
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Description of the Current Dialogue Model 



For illustration let's consider the case of a Phone System application: 



The Five Top-Level Dialogue Structures and Some Examples 



1 - Direct Answer to the Current Question 

Sys: What else? 

Usr: Erase new messages 

Sys: Are you sure ? 

Usr: Yes 



2 - Over-Informative Answer 

Sys: What else ? 
Usr: Call 

Sys: Who should I call ? 
Usr: Call David 



3 - Ellipsis on the Last Command 

Sys: What else? 
Usr: Play first message 
Sys: Playing message 1 
Usr: Second message 



4 - Corrective Expression 

Sys: What else? 
Usr: Call David 
Usr: No Brian 



- Dialogue Meta-Command 

Sys: What else ? 
Usr: Erase new messages 
Sys: Are you sure ? 
Usr: Repeat 



Fig 



ure 
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Figure 9 



Rule # 



Grammar Entries 



#0 

#1 

#2 
#3 



^50 



#4 
#5 



152 



#6 
#7 



I 



54 



#8 
#9 
#10 
#11 



Cl56 



#12 
#13 
#14 



158 



Define <Color> as 

.} 

-> red 
; -> black 
; -> green 

-> dark green 



160 



.162 



@:="Color=Red()" ; 
@>"Color=Black()" ; 
@:-"Color-Green(}" ; 
@:="Co lor-Gree n(I ntensi^LowO) " ; j 



Default @:-"Color=?() H ; 
} 

Define <Name> as 
{ 

-> alpha 
-> beta 

Default @:= H Name=?()" ; 
} 

Define <Item> as 
{ 

-> cube 
-> sphere 

Default @> n ?" ; 
} 

Define <NounPhrase> as 
{ 

-> the <Item> 

-> the <Color> <Item> 

-> the <Item> <Name> 

-> the <Color> <Item> <Name> 

Default (g^Item-TO" ; 
} 

Define <TopLevel> as 
{ 

-> <NounPhrase> @: 
-> Reduce <NounPhrase> @: 
-> <Color> @: 

Default @:-T ; 
> 



@:="Name=Alpha()" ; 
@:="Name-Beta()" ; 



@:-"Cube" ; 
@:="Sphere" ; 



@:="Item=$(Attributes{})"(@l); 
@:-"Item=$(Attiibutes{$})"(@2,@l); 
@:="Item«$(Attributes{$})"(@l,@2); 
@:="Item=$(Attributes{$,$})"(@2,@l,@3) 



=@l ; 

="Pred{Command-Reduce(Casel{Obj{$}})}' , (@l) ; 
«@i; 
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Incremental construction of a tree 

Given a mask, the procedure will create a sub-tree whose arcs represent the constraints 
that must take place to generate the associated syntactic fbrm(s) in relation with the 
grammar. When no arc is attached to a node, then the whole corresponding grammar 
rule can be used for expansion. 

1 - SetConstraintSemanticCompletion( &Tree, "Item»${Attributes{Color=$0}) ,, f 

&SysGrm,&TskGrm) 

2 - SetConstraintSemanticCompletion( &Tree, n Color«$($)", &SysGrm,8JskGrm) 

(Dollar sign ($) indicates that any string substitution is allowed) 



Before #1 



Tree 



After #1 



Tree 



TopLevel 



t 

NounPhrase 



OR Arcs 
AND Arcs 




Item 



#6 



After #2 



Tree 




t f 

NounPhrase Color 



J 

T #3 #2 #1 #0 




Figure 10 



#6 



#5 #4 03 m 
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Figure 11 

Building semantic masks from the dialogue context 



Dialogue Context: 



What do you want ? 

- Play track 5 
What else ? 



Parsing Tables 



Context 



Semantic Masks Generated 



: Pred{$} : 



Order-Play ($) 

Order=Raise($) 
Order=Pause() 



Case{$} ! 



: obj{$> ; 



Item=Program($) 
Item-Track(S) , 



: Attr{$} ; 



; Numbero$()) 



f Pred{$} ; 

f Order=Play($) ^ 

Order-Raise(S) 
^Order°Pause() J 




1 
1 
1 


• 
• 


(" 

; 


• 


Read 



Read 



| Read 


i 
i 


i 
t 


i 

i i 
i 


t 

f i 



Read 





• 

• 




i 

• 




i 


Read 



Immediate Read 



Flags 



NPE 
NPE 
NPE 

NPE 
NPE 
NPE 

NPE 
NPE 
NPE 



NPE 




_P_ 
_P_ 
_P_ 

_P_ 
_P_ 
P 



Pred{Order=PLay($)} 

Pred{Order»Raise($)} 

Pred{Order=Pause()} 



Order-Play($) 

Order=Raise($) 

Order=Pause() 



Case{Obj{Item«Track($)}} 

Case{Obj{Item»Program($)}} 

Case{Obj{Item«#($)}} 



NPE 


Obj{Item-Track($)} 


NPE 


Obj{Item-Program($)}} 


NPE 


Obj{Item=#($)} 


NPE 


Item=Track($) 


NPE 


Item-Prog ram ($). 


NPE 


Item-#($) 



Attr{Number=$()} 



Pronominal i 
Reference (#) 



NPE Number«$() 



Pred{Order»Play($)} 

Pred{Order«Raise($)} 

Pred{Order-Pause()} 



Order«Play($) 

Order»Raise($) 

Order-Pause() 



N: Form can be denied [ form #4 ] 

P: Form can be used as it is [ forms #2 (and #1 in case of the immediate Read)] 
E: Form can be used as an ellipsis [ farm #3 J 
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SUPERVISED CONTEXTUAL LANGUAGE 
ACQUISITION SYSTEM 

This is a continuation of U.S. patent application Sex. No. 
O&/20U93, filed Feb. 25, 1994 entitled "Sinpervised Con- 
textual Language Acquisition System" now abandoned. 

BACKGROUND AND SUMMARY OF THE 
INVENTION 

The present invention relates generally to natural lan- 
guage modeling, speech and dialogue recognition by com- 
putet. More specifically, the invention relates to a multimo- 
dal dialogue environment and user interface to assist a user 
in acquiring the language of an application or computer 
program using text and speech input The system allows 
users unfamiliar with the language or available commands 
of an application or computer program to progressively 
build sentences which will have meaning to the application 
or computer program. 

The introduction of text and . speech input and output 
rhflnnrlc in applications and computer programs responds to 
a growing need for user-friendly interfaces. There is, 
nevertheless, still a long way to go. A more natural interac- 
tion between human and machine is necessary before com- 
plex computing machines, application software and com- 
puter programs will be truly useful to the masses. 

By way of example, consider the interface between the 
average human user and a full-featured database manage- 
ment application. Although the human user may be quite 
intelligent, he or she may not be fully acquainted with the 
capabilities and features of the application program and may 
not have the time or desire to consult the necessary series of 
help screens and users' manuals to find out As a result, 
many of the functions and features of the application are 
unused. 

In one respect, the problem may be that even complex 
computer applications and computer programs do not pro- 
vide the flexible input/output bandwidth that humans enjoy 
when interacting with other humans. Untfl mat day arrives, 
the human user is relegated to the position of having to learn 
or acquire a precise knowledge of the language that the 
computer application can understand and a similar knowl- 
edge of what the application will and will not do in response. 
More precisely, the human user must acquire a knowledge 
of-enough nuances of the application language to allow the 
user to communicate with the application in syntactically 
and semann'calry correct words or phrases. 

Point and click graphical user interfaces were developed 
in part to simplify the human/machine interface. While icon 
and menu selection is helpful in simplifying some types of 
human/machine interaction, often the use of such systems 
entails a great deal of effort by the user, with continual 
shifting between keyboard and mouse. To some, today's 
graphical user interfaces are like playing charades with the 
computer when the user would rather simply speak. 

Communication using natural language speech for input 
and output has the potential to greatly improve naturalness 
and . ease of use. However, the mere use of voice-based 
systems does not guarantee the success of a user interface, 
since human users, accustomed to speaking with other 
humans, naturally expect a complete dialogue environment 
The ability to recognize simple words or sentences is not 
enough. The complete dialogue environment needs to take 
into account a history of what has been said before and needs 
to provide the human user with the ability to correct, amplify 
and explain previous statements. The complete dialogue 
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environment should also have the ability to handle speech 
recognition errors. 

The work by others on improving the user interface has 
typically centered on individual aspects of the problem. For 

5 example, U.S. Pat No. 5,103,498 to Lanier et at entitled 
'Intelligent Help System," describes an intelligent help 
system which processes information specific to a user and a 
system state. Tht system mcorporates a monitoring device to 
determine which events to store as data in an historical 

io queue. These data, as well as non-historical data (eg. system 
state) are stored in a knowledge base. An inference engine 
tests rules against the knowledge base data, thereby provid- 
ing a help tag. A display engine links the help tag with an 
appropriate solution tag to provide help text for display. 

15 U.S. Pat No. 5,237,502 to White entitled "Method and 
Apparatus for Paraphrasing Information Contained in Logi- 
cal Forms,'' discloses a comr^ter-implementcd system for 
creating natural language paraphrasing of information con- 
tained in a logical form 

20 U.S. Pat No. 539,617 to Gardner et aL entitled "Method 
and Apparatus Providing an Intelligent Help Explanation 
Paradigm Paralleling Computer User Activity,'* describes an 
on-line, interactive intelligent help system which provides 
suggestions as to actions a user can take after entry into the 
system of an erroneous command or a question. The system 
responds with explanations of why the suggestions were 
made an how they work. 
The system includes a natural language analyzer for 

30 converting the questions into goals. A knowledge base and 
an inference engine further analyze the goals and provide 
one or more suggestions on how to achieve such goals. An 
explanation generator uses such analysis to dynamically 
generate the explanations which are tailored to the user's 

35 goals. 

U.S. Pat No. 5,241,621 to Smart entitled "Management 
Issue Recognition and Resolution Knowledge Processor," 
describes a knowledge processing system with user interface 
for prompting the user to enter information and for receiving 

40 entered information from the user. The user interface is 
coupled to a knowledge model processor that includes a 
dialogue control interpreter mat provides structured mes- 
sages to a user to elicit responses from the user. The elicited 
information is stored in a user awareness database. The 

45 dialogue control interpreter operates according to predeter- 
mined dialoguing imperatives to elicit, record and access 
user responses in sequences that guide and motivate the user 
to follow predetermined sequences of thought, based on the 
recorded user awareness database. 

50 U.S. Pat No. 5,255,386 to Prager entitled "Method and 
Apparatus for Intelligent Help That Matches the Semantic 
Similarity of the Inferred Intent of Query or Command to a 
Best-Fit Predefined Command Intent," describes a data 
processing system which suggests a valid command to a user 

55 when the user enters a question or an erroneous command. 
The purposes of various commands executable by the sys- 
tem are stored as a plurality of intents. When the user enters 
a question or an erroneous command, the system looks up 
the intent corresponding to it and semantically compares 

60 such an intent with other intents. When another intent is 
found to be within a predetermined degree of similarity, 
based on the comparison, the command defined by such 
intent is offered as a suggestion to the user. 
The above art does not provide a general mechanism for 

65 language acquisition in a multimodal framework. 
Furthermore, the methods described in the above art do not 
take into account characteristics inherent in the speech 
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modality which allow the user to input words, sentence application scripts will be run. At each step of the interpre- 

fragments or complete sentences. Finally, the above art does tation the dialogue history will be updated until a point is 

not implement a dialogue system does not provide a dia- reached where the dialogue system requires another user 

logue environment and does not provide a toolkit to develop input 

dialogue-oriented applications. 5 Because the dialogue system has a knowledge of the 

lUe present invention provides a dialogue system and a ^ogue-througi the dialogue history and Jmme- 
,JTTT *^ V7 „ * u „' f „ diate context— it is possible for the system to predict what 
complete dialogue environment to allow a human user to Dext agoing to the dialogue model To do so, 
communicate with an application using natural language ^ ^ m nnstniatd by the system. Tney represent 
words, sentence fragments and complete sentences. The syntactic constraints that are applied on the system grammar 
dialogue system keeps track of the dialogue being run by 10 ^ me ^ &amoa[m ^ m built mc existing 
m ai nt ai n ing a dialogue history. The dialogue history con- dialogue context, the built-in dialogue model and the system 
tains the dialogue turns or exchanges already made and the ^ ^ grammars< preferably semantic constraints repre- 
context in which they occur. The system also makes use of senting what can be understood are used to derive the trees, 
the i mme dia te context The immediate context refers to what ^ fa^s ^ ^ ^cd to generate the words, sequences 
is logically expected from the user at this time. The imme- 15 ^ \yords or sentences that can be successfully inputted, 
diate context is extracted from the appHcation scripts rep- preferably the two trees are built after each user input The 
resenting the scenario. Therefore the system aiitomatically ^ ^ des for ^ corn- 
takes into account what has been done and what can be done subU nguage to the user interactively. This way the 
next It makes it possibk :to prornpt t^ user wim p^^ user can acq^m understanding of what the target appH- 
sentences ox fragments of sentences th^ be unacrstcod * ^ ^ ^ m ^standing of me language that can 
at that point m the dialogue. The s toed history is also ^ ^ tQ ^^^^ ^ the machine, 
available to allow the system to backtrack ox revert to a _ . . _ . . . 
previous point in the dialogue, allowing the user to readily , 11 15 « object of the invention to provide a 
cancel exchange previously c^minm^ited dialogue. system and user guidance system to 
auuijjc picviuuaiy jj „ allow the user to interact with the target application even if 

The dialogue system provides a dialogue model which hc w she has Kttlc OT n0 ^ experience with the applica- 

defines the type of interaction that is supported. It is based don ^ ^ regard, it is another object of the invention to 

on the notion of meta-language or meta-comrnands. Meta- ^ to progress ^ me building of a sentence' or 

commands are commands that the dialogue system handles rommandi wor d by word or phrase by phrase, receiving 

internally. Such commands are thus different from simple ^ assistance from the dialogue system as needed. It is a further 

commands that are intended to be meaningful to the target of me invention to automatically compensate for 

application or computer program. Since meta-commands are mistakes in the building of sentences or commands, such 

handled by the dialogue system, me sr*cifi«uon of the mistakes occurring, for example, through speech recognition 

apphcation program is simplified and focused on the task. j t i s also an object of the invention to provide a 

For example, the expression "paint the cube alpha in red" is ^ system which can propose choices for the user in 

considered a simple expression which might be intended for the building of sentences or commands, the choices being 

a target application such as a paint program. Conversely, the arranged in order of plausibility given the existing dialogue 

expression -no in blue" is a ir^a-command, smceit operrtes cMaL Stm me invention allows the language 

on the dialogue structure itself. In mis way, the word no acquisition mechanism, also called dialogue completion 

is intended for and therefore interpreted as an instruction to mcchanism t0 be activated at the user's request, 

the dialogue system to negate the previous statement and Morc specmcallv , mc invention in its preferred ernbodi- 

correct it Other examples of meta-commands include: m . iip J; , n "1' • . fn rw ^*~ „* ^ .wwe«ki* 

«^ Q „ - „ *v^«t»ZZi'*u^ " mcnt uscs two constraint trees to represent the possible 

vancei, Kepear ana mm. syntactic structures that can be generated from the dialog 

The dialogue system of the invention also includes a history and the immediate context Given an initial string or 
mechanism for storing natural language grammar for the ^ partial sentence, the trees will be used to generate the list of 
target application (task grammar) and also far the dialogue possible next words— or completion list— to provide as sis- 
sy stem itself (system grammar). tance to the user in learning the application language. The 

The grammars are used by the dialogue system for the invention provides a protocol between the target application 

recognition of text and speech input and for generation as and a dialogue system. The protocol allows information to 

welL They are syntactic-semantic grammars that describe 50 be exchanged between the user and the target application 

the sentences that are proper and syntactically correct along and to give the user guidance in acquiring an understanding 

with their meaning. A single representation formalism inde- of the target application language in context The language 

pendent of the I/O modes which is based on a case-frame acquisition system is multimodal; it provides a mechanism 

representation is being used as semantic representation. for both text and speech input, independently for any target 

Application scripts are programs made of elementary dia- 55 application, using natural language described by a syntactic- 

logue instructions that are interpreted fay the dialogue system semantic grarnmar. The dialogue system provides a mecha- 

to parse and evaluate user inputs. They describe what are nism which allows a user to progressively acquire knowl- 

possible user cemmands that are valid and their logical edge of the language and to progressively build a sentence 

sequencing. They describe also the processing that must be or command without actually executing the command. In 

done in the target application when a user input has been a this way, a user can browse through the features of a target 

parsed. application without actually operating the application. The 

Given an input event form the user the dialogue system mechanism also provides the user with an execution option 

will first try to recognize it and convert it into a semantic to allow the sentence or command to be executed by the 

expression. Then the expression will be tested against the target application. 

dialogue model to A***rm\n* if it is a mete-command or a 65 For a more complete understanding of the invention and 

simple cornmand. If the inputted expression makes sense in its objects and advantages, reference may be had to the 
the dialogue context it will be interpreted and eventually following specification and to the accornpanying drawings. 
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BRIEF DESCRIPTION OF TOE DRAWINGS 

FIG. 1 is a block diagram illustrating the architecture of 
the presently preferred dialogue system, showing the mod- 
ules used in text language acquisition highlighted in bold 
lines; 

FIG. 2 is a similar block diagram, showing the modules 
used in speech language acquisition highlighted in bold 
lines; 

FIG. 3 is a series of computer screens showing a typical 
sequence of user/computer interaction, useful in understand- 
ing the language acquisition mechanism of the invention; 

FIG. 4 is an exemplary screen showing text completion by 
the invention; 

FIG. 5 is a description of the dialogue model, using a 
telephone system application as an example; 

FIG. 6 is a flow diagram representing the actions which 
take place in text completion mode; 

FIG. 7 is an exemplary screen showing speech completion 
by the invention; 

FIG. 8 is a flow diagram representing the actions which 
take place in speech completion mode; 

FIG. 9 illustrates an example of a task grammar of the 
type utilized by the presently preferred dialogue system; 

FIG. 10 is a diagram illustrating bow the constraint tree is 
built; 

FIG. 11 illustrates how the semantic masks are built from 
the dialogue context 

DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

An overview of the architecture of the presently preferred 
embodiment is illustrated in FIGS. 1 and 2. Both FIGS. 1 
and 2 are essentially the same, except that FIG. 1 highlights 
in bold lines those modules involved in text language 
acquisition, whereas FIG. 2 highlights those modules 
involved in speech language acquisition. Therefore only 
FIG. 1 will be described in detail. However, it will be 
understood that the following description also applies to 
FIG. 2. 

In FIG. 1 the dialogue system or dialogue server is 
depicted generally at 20 and the target application is 
depicted generally at 22. The dialogue system in turn 
comprises a dialogue manager 24, an input/output manager 
26 and an application manager 28. The dialogue manager 24 
serves as the intelligent interface between the input/output 
manager 26 and the application manager 28. The input/ 
output manager 26 is responsible for handling multimodal 
communication with the human operator. The application 
manager 28 is responsible for handling communication with 
the target application 22. The input/output manager 26, the 
dialogue manager 24 and the application manager 28 work 
together to provide a multimodal interlace between the 
human operator and the target application. 
Input/Output Manager 

More specifically, the input/output manager or I/O man- 
ager 26 is in charge of managing the input and output flow 
between the user and the dialogue manager 24. Specialized 
sub-modules are used to handle the different I/O modes. 

In input, the I/O manager's role is to recognize and 
understand input events coming from the drivers (discussed 
below) and to communicate their possible meanings to the 
dialogue manager for interpretation, A single semantic rep- 
resentation formalism, independent of the input device being 
used, is defined based on a case-frame representation. Far 
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each mode supported by the server, a recognizer is provided. 
The recognizer decodes the signal coming from the attached 
driver and tries to parse and interpret it. The result is a list 
of possible meanings expressed in the semantic language. 

5 As shown in FIG. 1, drivers can be either application 
dependent — in which case applications are required to 
handle the device — or application independent— in which 
case they are directly handled by the dialogue server. In the 
farmer case, the server provides functions to send input 

10 messages via the application manager 28. Application 
dependent drivers include the mouse driver and keyboard 
driver, since they are very hardware and software dependent 
(e.g. dependent on the type of keyboard, operating system 
and graphic library used to develop the application). On the 

is other hand, application independent drivers include the 
microphone and touch screen drivers, since these drivers and 
their corresponding hardware are provided with the server. 

In output, the I/O manager's role is to generate messages 
which are understandable by the user. It contains generators 

20 for each output mode, to derive signals from semantic 
expressions representing the information to be communi- 
cated Here again, a driver (application dependent or appli- 
cation independent) is attached to each generator. One 
application dependent driver is the text driver, used for text 

25 output One application independent driver is the speaker 
driver, used for speech output 

To each generator or recognizer is attached a device 
database that contains the necessary knowledge and/or 
expertise to operate the ceding/decoding phase. Depending 

30 on the device, it can be either a static database — text and 
speech input and output— or a dynamically maintained 
database — touch screen and mouse input Device databases 
are initialized at start-up and are configured through the 
input configurer 81 that channels configuration requests 

35 from the application or from the dialogue manager. 

The presently preferred embodiment is multimodal, in the 
sense that it allows multiple modes of human interaction. 
The preferred embodiment supports keyboard input, mouse 
input, touch screen, speech input and also provides both text 

40 and speech output While these modes of communication are 
presently preferred for most computer applications, the 
invention is not limited to these forms of communication. If 
desired, the capability of handling other modes of 
communication, such as three-dimensional position sensing 

45 as data input and such as tactile feedback as data output can 
be included. Support for multiple modes of input and output 
is provided by the input/output manager 26, which includes 
an input handler 30 and an output handler 32. 
The input handler 30 formats and labels the output of each 

50 recognizer, so that the same formalism is used whatever 
input device is used. It delivers a list of possible 
interpretations, or semantic expressions, representing the 
input event to the dialogue manager 24 for processing. The 
input handler 30 is in turn coupled to a plurality of recog- 

55 nizers 34o-34< (collectively referred to herein as recogniz- 
ers 34) and drivers. Given an input event a recognition/ 
understanding phase will be started with the objective to 
derive the best semantic expressions that represent the event 
In a typical embodiment, there may be one recognizer for 

60 each mode of input Tb the recognizers are coupled the 
appropriate device drivers if needed. In FIG. 1 a touch 
screen (TCS) driver 36 couples the touch screen 37 to 
recognizer Mb. Mike driver (MIC) 38 couples the micro- 
phone 39 to its associated recognizer 34a. In FIG. 1 kcy- 

65 board 40 and mouse 41 communicate directly with keyboard 
recognizer (KBD) 344 and mouse recognizer (MOS) 34c 
without the need for additional device drivers, as these 
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would normally be provided by the device manager of the 
application. In FIG. 1 an additional "generic" recognizer 34* 
has been illustrated to show how additional input channels/ 
input modes may be added. 

The input handling architecture of the preferred embodi- 
ment may be viewed as a plurality of data conversion layers 
which take device specific information (e.g. from a 
keyboard, microphone, mouse) and convert that information 
into a semantic representation suitable for manipulation by 
the dialogue manager 24. The dialogue system uses a single 
representation formalism which is independent of the input 
or output devices. In this regard, the device drivers, such as 
touch screen driver 36 and mike driver 38 are programmed 
to convert the electrical signals produced by human inter- 
action with the input devices, into a data stream which the 
recognizers 34 can act upon. 

The presently preferred I/O manager 26 handles four 
input devices: microphone (MIC) for speech input, keyboard 
(KBD) for text input, mouse (MOS) for mouse input and 
touch screen (TCS) for touch screen input In the case of 
speech and text input, the dialogue databases 74 are used for 
recognition and understanding. Note that the microphone 
and touch screen drivers are built into the dialogue server 20. 
On the other hand, the mouse and keyboard drivers are 
located on the application side, since the way events are 
acquired depends on the graphic software used (e.g. 
Windows, X- Windows, Apple System 7). 

In the case of the speech input recognizer 34a, the data 
stream from mike driver 38 might be a digitized represen- 
tation of the input speech waveform, which the speech 
recognizer would process to identify phonemes and ulti- 
mately words. In the case of the keyboard recognizer 344, 
the data stream from keyboard 40 may be in form of a 
sequence of ASCII characters or keyboard scan codes, which 
the recognizer would group into words. Finally, the mouse 
recognizer 34c might receive X-Y positional information 
and key click event data as a serial data stream which the 
mouse recognizer 34c would buffer through to the input 
handler 30. In this regard, each of the recognizers supplies 
its output to the input handler 30. If desired, the individual 
recognizers can attach a header or other identifying label so 
that the input handler 3§ and the other modules further on in 
the data stream will have knowledge of which mode was 
used to input a given stream of data. The input handler 30 
provides its output to the input processor 44 of the dialogue 
manager 24. The input processor 44 may be programmed to 
handle data conflicts, such as those which might occur when 
two modes of input are used simultaneously. 

To handle speech and text I/O operations a plurality of 
dialogue databases 74 may be used by the recognizers/ 
generators. They describe the natural language expressions 
that can be used as well as their meaning. A dialogue 
database is composed of an alphabet, a lexicon and a 
grammar. A dialogue database can be provided for different 
languages (e.g. English, Japanese, French) as illustrated. 
The task database describes the possible expressions in the 
application domain. The system database describes the pos- 
sible expressions in the dialogue domain. Note that the 
system database contains pointers to the task database to 
build correction requests for instance. 

More specifically, the dialogue databases 74 contain syn- 
tactic and semantic information on the language that the 
dialogue server can understand and therefore process. These 
databases are used for text and speech input and output 
Each database is composed of an alphabet, a lexicon and a 
grammar that describes the structures — sentences or frag- 
ments of sentences — that can be legally built along with 
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their associated meaning. In the presently preferred embodi- 
ment there is a system dialogue database and a task dialogue 
database. The system dialogue database describes the struc- 
tures that compose the meta-language and is application 
5 independent The task dialogue database describes the struc- 
tures that are specific to the application being developed. 
Therefore, for each new application a task dialogue database 
must be defined by the application programmer. These 
databases are used for recognition by the microphone and 
10 keyboard recognizers and for generation by the speaker and 
text generators. Dialogue databases are language dependent 
Therefore, system and task dialogue databases must be 
provided for each language used (e.g. English, Japanese . . 
. ). Note that no other adaptation is needed since the dialogue 
15 server and application operate and communicate at a seman- 
tic level. 

On the data output side, the input/output manager 26 
employs a similar, layered architecture. In this case, seman- 
tic expressions from the output processor 46 of dialogue 
20 manager 24 are supplied to the output handler 32 for 
generation. The output handler 32 is mainly a dispatcher that 
activates the necessary generators to generate messages to 
the user. The input data to the generators is a semantic 
expression. These expressions are converted to an output 
25 event directly manageable by an output driver. Preferably, 
the data delivered to output handler 32 includes a header or 
other identifying label which the output handler 32 uses to 
determine which output device or devices should be used to 
communicate with the human user. Output handler 32 sup- 
30 plies the data to the appropriate generators or converters 
48a-48c (collectively referred to as converters 48), which in 
turn function to convert the data stream into an appropriate 
signal for delivery to a specific data output device. Hie 
presently preferred embodiment uses two. output devices: 
35 speaker (SPK) for speech output and text (TXT) for text 
output In both cases, the dialogue databases 74 are used for 
generation. 

In the case of speech converter 48a, the data are converted 
using speech synthesis procedures to generate a data signal 
40 suitable far driving speakers 51 through its associated 
speaker driver 50. The presently preferred embodiment uses 
the StTTalk synthesizer of the applicants' assignee although 
any suitable speech synthesizer can be used. The speaker 
driver of the preferred embodiment is built into the dialogue 
45 server 20. 

Similarly , text converter 4&b assembles data in a form 
suitable for display on a monitor 53. Has data are passed to 
the computer via I/O dispatcher 82 to the application pro- 
gram 22. The text driver is located in the application side 
50 (Le. within application 22). 

As before, an additional converter 48c has been illustrated 
to demonstrate how an additional output device might be 
coupled to the output handler 32 through an appropriate 
converter. 

55 The dialogue manager 24 is the primary component 
responsible for the language acquisition capability of the 
present invention. At the heart of dialogue manager 24 is the 
central dialogue processor 60. This processor receives input 
from parser 44 and provides output to output processor 46. 
60 In addition, the central dialogue processor 60 also supplies 
output to the application manager 28. Central dialogue 
processor 60 uses the services of several additional software 
modules which have been illustrated in FIG. 1. These 
modules include a history handler 62, an instruction inter- 
65 preter 64, a script handler 66, a context handler 68, a 
meta-language interpreter 42, special function handler 56, 
reference solver 52 and uncertainty solver 54. 
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Dialogue Manager 

More specifically, the Dialogue Manager 24 is the heart of 
the server. It basically interprets the input events coming 
from the user via the I/O Manager that triggers actions inside 
the application via the Application Manager and eventually 
sends messages back to the user via the I/O Manager. All the 
processing that occurs here operates at a semantic level since 
all the syntactic information is transformed — coded or 
decoded— inside the I/O Manager. The Dialogue manager is 
composed of several specialized modules that allows the 
manager (1) to acquire a knowledge of the application being 
served via its dialogue specification — the scripts that it 
runs—, (2) to interpret input events in context by preserving 
a dialogue history and (3) to handle errors and uncertainties. 
Ultimately the strategies and the generic functions that have 
been implemented in the Central Dialogue Processor define 
a Dialogue ModeL The model describes what the system can 
handle and how it will react to any dialogue situations. The 
notion of M eta-Language has been introduced to define the 
expressions that the Dialogue Manager can understand. This 
language is composed of (1) dialogue specific expressions — 
having meaning at the dialogue level (e.g. 'Repeat,' 
* Cancel') — , (2) application specific expressions — having 
meaning at the application level (eg. 'Play new messages*) 
— and (3) mixed expressions (eg. 'No message 2'). Hie 
dialogue manager can be placed in a mode whereby com- 
mands are built by the dialogue system but not actually sent 
to the application program This capability allows a user to 
browse through or explore the target application's features, 
without actually operating the target application. This fea- 
ture is performed by the special function handler 56, 
described below. 
Central Dialogue Processor 

The Central Dialogue Processor 60 coordinates the activ- 
ity between the different modules inside the Dialogue Man- 
ager. Its role is to interpret user inputs in accordance with the 
application scripts while ensuring a reliable 
commu nication-— error and uncertainty management in case 
of speech recognition errors for instance or ambiguities due 
to different possible interpretations of an input — and solving 
dialogue specific situations — e.g. processing of elliptical or 
anaphorical forms involving the dialogue context or pro- 
cessing of requests for repetition, correction and 
assistance — . That is the Central Dialogue Processor mat 
will initiate questions to the user when information is needed 
by the application or it will use default values suppressing 
dialogue turns. One has to say that the dialogue— exchange 
of turns — is totally transparent to applications. Applications 
express interactional needs to the processor, via the scripts 
but that is the processor that decides how the information is 
going to be obtained — through a direct question to the user, 
previously buffered data, default values or resolution 
strategies — . When the processor is idle waiting for user 
input, time-outs may be set so that actions will be taken if 
no input was provided within the specified delay. 
Script Handler 

The Script Handler 66 is responsible for providing the 
Central Dialogue Processor with the dialogue instructions 
the processor requests. Basically it serves as a dynamic 
instruction database defined to decrease the communication 
load between the server and the application. When an initial 
request is made on a given instruction, the handler will look 
for it in the database. If it is not found, a request to the 
Application Manager will be made for loading. At mat point 
the application will be asked to transfer the specified instruc- 
tion. 
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Context Handler 

The Context Handler 68 is in charge of the dialogue 
context The dialogue context represents the state of the 
ongoing dialogue. The context is composed of a short-term 

5 context and a long-term context The short-term context 
holds information on the immediate focus of the dialogue — 
current question or current expectations — and also contains 
flags representing whether a proposal was made, the assis- 
tance and help modes are active or not etc. On the other hand 

to the long-term context represents the story that has already 
been run. This task is supervised by the submodule History 
Handler. In a dialogue situation long-term information is 
essential to allow the interpretation of meta-language 
expressions such as request for correction and cancellation 

15 or to solve ellipses and references. 
History Handler 

The History Handler 62 is responsible for providing the 
Context Handler with long-term information on the dia- 
logue. In Partner the history is a circular buffer containing 

20 the last dialogue instructions that have been executed as well 
as a set of input markers and eventually correction markers 
if corrective inputs have been made. Therefore the dialogue 
history does not contain only the User-Machine exchanges 
at the difference of other systems but integrates the execu- 

25 Hon context as well. 
Instruction Interpreter 

The role of the Instruction Interpreter 64 is to interpret the 
dialogue instructions given for processing by the central 
processor. The current system contains IS dialogue instruc- 

30 tions. The interpreter evaluates the instructions and main- 
tains different class and sets of variables. It also contains 
functions to undo or redo sequences of instructions when 
asked by the central processor in case the user makes a 
request fox explicit correction for instance. The dialogue 

as instructions can be grouped into four categories: (1) Input/ 
Output instructions, (2) control instructions, (3) variable 
management instructions and (4) low-level application man- 
agement instructions. Application scripts are composed of 
dialogue instructions and describe the application scenario 

40 in the proposed programming language. This set of ins true- 
tions is generic and independent of the application. 
Meta-Language Interpreter 

The Meta-Language interpreter 42 is responsible for the 
detection and the primary handling of meta-commands 

45 received via the input processor. Meta-commands are user 
inputs which only make sense at the dialogue level in 
opposition to the application level. Such command include: 
'Repeat/ 'Cancel,' 'No Message 2,' 'Speak Japanese/ etc 
Depending on the command updates will be made on the 

50 appropriate dialogue entities. Mcta-Cornmands involving 
manipulation at the script level— correction and cancellation 
requests — are then handled directly by the central processor. 
Reference Solver 
The role of the Reference Solver 52 is to assist the central 

55 processor when elliptical and anaphorical input are received 
from the user. For instance when the semantic input corre- 
sponding to the command *No Paint It Red,' the pronominal 
referent It must be resolved to understand the command. The 
Reference Solver uses syntactic and class constraints con- 

60 tained in the semantic expression and the dialogue history to 
find the most plausible matching. Then it substitutes the real 
value of the reference. A similar scenario is made in the case 
of elliptical input 
Uncertainty Solver 

65 The Uncertainty Solver 54 is the module that assists the 
central processor in choosing the correct hypothesis when 
several candidates far a given input event are generated by 
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the I/O manager. When H*niing with speech recognition and Output Processor 

natural language processing, one has to take into account The Output Processor 46 is used to generate dialogue 

recognition errors that are introduced due to a lack of messages to the user. Note that each message is a semantic 

competence or performance of the speech recognizer, or also expression that will be sent to the I/O Manager for conver- 

the inherent ambiguities introduced by the language itself. 5 sion and transmission. Messages can be played through 

As a consequence to one input event — speech act, for different media at the same time. Commonly most spoken 

instance — several possible meanings would be derived by messages are also outputted as text messages to get a trace 

the speech recognizer. Note that if speech recognizers were m a dialogue window for instance. Two output modes are 

fully capable of recognizing speech, only one meaning currently supported: Text and Speech output The output 

would be generated in most cases. The Uncertainty Solver 10 processor has also the possibility to cancel speech messages 

helps the central processor in doing implicit error recovery. being played— talk over the feature— In a dialogue situation 

The dialogue context is used to find out which one of the j t |g YcrY common for users to answer or given commands 

possible meanings makes the most sense at this point of the quickly without waiting for the system to be ready and 

dialogue. waiting. In such cases the speech output in progress is 

Special Function Handler 15 cancelled as well as any pending outputs. 

The Special Function Handler 56 is a module whose role Application Manager 

is (1) to process application requests, (2) inform applications The Application Manager 28 is an interface mat serves as 

that certain conditions inside the dialogue server are met, a communication link between the server's processing 

and (3) to send secondary dialogue requests to the applica- modules — Dialogue Manager and I/O Manager — and the 

tion. Note that primary dialogue requests — e.g. requests for ^ application, little processing is made here. The primary 

execution and requests for dialogue instructions — are objective of the Application Manager is to synchronize the 

handled directly by the Central Processor and the Script different requests (ie. Dialogue and Application Requests) 

Handler. They are requests for information or processing, and to prevent mutual blockage. Note that Dialogue Server 

while secondary dialogue requests are notification requests. and Application are separate processes and that both of them 

There is a wide variety of Special Functions handled by ^ can send requests at any time. The Manager is composed of 

the module that we will not detail in this survey. Here are a 1^ i cve \ communication driver that handles the physical 

three examples: ports and four specialized preprocessors. 

1. When the dialogue server is processing a user input, a The application manager 28 is responsible for providing 
Busy status message will be sent to the application. As a proper interface to the target application 22. The actual 
soon as it is idle waiting for user input, a Ready status 30 interface is effected by a communication driver 76, which 
will be sent couples to or interfaces with the dialogue interface driver 70 

2. When the application wants to enter the Text Acquisi- residing in the target application. The application manager 
tion Mode, a specific message will be sent that will also includes an execution handler 78 which is responsible 
activate the mode inside the server. In that case coraple- for sending the appropriate commands — specifically called 
tion information will be sent back to the application 3s application functions — to the application program that 
later on. executes them. Application functions are triggered as a 

3. When the meta- command 'Show the Manual 1 has been result of interpretation of user inputs. The execution handler, 
recognized by the Dialogue Manager, a request will be in effect, mimics in some ways the input to the application 
sent to the application to open the manual window. program which a user would normally enter in using the 

Input Processor 40 application program. 

The Input Processor 44 handles the input events sent by In some instances, a user may request a previous com- 

the I/O Manager. It has the possibility of buffering input mand or instruction to be withdrawn or corrected The 

events in case of quick successive inputs made by the user. dialogue manager is able to recognize such a request as 

To each input event is attached its source. Presently it can be being intended initially as a command to the dialogue 

either TouchScreen, Mouse, Keyboard or Microphone. 45 system. To the extent this command requires actions to be 

When processing an input event, the central processor will taken by the target application, the execution handler 78 

ask the Input Processor for the possible meanings as well as handles this as well. Thus for example, if the user gives the 

their plausibility score. Finally when an input has been instruction "paint the cube blue," and then immediately 

processed the input event will be cleared off. Functions are thereafter instructs (he dialogue system with the command 

also provided to configure the input devices by giving hints 50 "cancel that,** the central dialogue processor 60 will interpret 

. on what makes sense at the dialogue level in order to obtain the command and examining the stack of previously done 

a more accurate and faster recognition. The current imple- actions contained in the dialogue history. It will then select 

mentation allows to configure the Microphone recognizer the proper undo function and tell the application through the 

only by means of a dynamic lexicon which indicates what execution handler to execute it 

words are possible given the current dialogue context A new 55 Communication Driver 

lexicon is computed after each dialogue turn. Finally note The Communication Driver 76 supervises the communi- 

that the Input Processor functions under two modes for cation and handles the communication protocol between the 

speech. In Normal mode inputs are interpreted as received Dialogue Server and the applications. Several physical ports 

by the Central Processor. In Fragment mode speech inputs are open to handle the logical connection. It is composed of 

are concatenated instead until an execute command is 60 low level routines to detect the presence of data and to read 

received. The Fragment mode corresponds to the Speech and write data on the connection. 

Language Acquisition feature that allows users to interac- Dialogue Instruction H andler 

tivefy build valid sentences word by word or fragment by The Dialogue Instruction Handler 80 is used by the Script 

fragment by voice until a meaningful sentence has been Handler to load the dialogue instructions one by one. Its role 

constructed. When this feature is enabled the Special Func- 65 is to access the requested instruction via the Communication 

tion Handler, using the dialogue context, will provide appli- Driver, to parse it and to format it into an internal represen- 

cations with a list of possible following words for display. tation directly usable by the Script Handler, 
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Execution Handler 

The Execution Handler 78 is used by the Central Proces- 
sor to trigger the execution of functions inside the applica- 
tion (eg. Update Function) or get information that require 
processing (e.g. Help Function). Reference to the applica- 
tion functions is made in the scripts. Typically when the 
semantic expression corresponding to the command 'Play 
New Messages* has been parsed by the Central Processor, a 
specific processing — TlayMessagesfNew)'— will be 
requested via the Execution Handler so that the application 
will play the new messages that have arrived. 
Request Handler 

The Request Handler 58 synchronizes the requests com- 
ing from the application or from the Dialogue Manager so 
that no conflict will occur. Eventually application requests 
will be buffered temporarily until the Dialogue Manager is 
in a coherent state to satisfy the request Dialogue requests 
are propagated directly to the application. 
I/O Dispatcher 

The I/O Dispatcher 82 is in charge of communicating 
input or output information from or to the application 
dependent drivers. In input it will dispatch the input events 
to their appropriate recognizer — namely Mouse recognizer 
or Keyboard recognizer — . In output it will centralize the 
preprocessed events coming from the generators and will 
send them through the Dialogue-Application connection. 
Input events are buffered when their associated recognizer is 
busy. 

Hie Target Application 

Dialogue Server 20 and applications 22 are separate 
processes. The server provides a toolkit that defines the type 
of connection and the cornmiinication protocol An Appli- 
cation Specification Model is provided to develop dialogue- 
oriented application using the toolkit Application program- 
mers must provide (1) Task Dialogue databases— one for 
each language (English, Japanese, etc.) — , (2) a set of scripts 
and application functions that represent the application 
scenario, (3) a dialogue interface driver that pilots the 
Dialogue-Application connection, and (4) a Port Manager to 
handle eventual application dependent drivers if needed. 
Besides these requirements they are free to structure the 
application as they wish and to use any software or hardware 
to implement it 
Dialogue Interface Driver 

The Dialogue Interface Driver 70 should be typically 
composed of a main loop to catch dialogue requests and a 
variety of request functions. The type of requests available 
will not be discussed here. 
Task Manager 

The Task Manager 72 will contain the application inter- 
nals. Note that the application functions referenced in the 
application scripts must be defined here since the dialogue 
manager will send requests for execution on these functions. 
Port Manager 

The Port Manager 74 should be in charge of controlling 
the application dependent dialogue devices. The current 
system invites application programmers to define a Mouse 
driver and a Keyboard driver in input and a Text driver in 
cutout This is optional. If no keyboard is used by the 
application, the keyboard driver does not have to be defined. 
Application Resources and Devices 

Application Resources and Devices represent the data and 
devices that are specific to each application and are not 
dialogue related. In the case of a Compact Disk player 
application, resources might represent a set of compact disks 
and the device might be a compact disk player controlled by 
the application. 
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Scripts and Application Functions 

The Scripts and Application Functions are the dialogue 
specification provided by the application mat describe the 
scenario, valid operations available for this task and their 

5 sequences. The Toolkit defines a programming language to 
formalize the Human-Machine interaction. The program- 
ming language is application independent and is composed 
of elementary dialogue instructions. Dialogue instructions 
will be requested by the server and it will interpret them. The 

to interpretation of user inputs when successful will activate 
the application functions referenced in the scripts. 
Task Dialogue Databases 

Task Dialogue Databases contain a definition of the 
natural language expressions far text and speech input/ 

15 output These Databases arc used for recognition and gen- 
eration by the I/O Manager. A dialogue database is com- 
posed of an alphabet, a lexicon and a grammar that describes 
the possible sentences or fragment of sentences with their 
meanings. A Task Dialogue Database is application depen- 

20 dent since it describes the restricted natural language used 
for a given application. A Task Dialogue Database must be 
provided for each language (English, Japanese, etc) the 
application will use. 
The Language Acquisition System 

25 The invention provides a language acquisition system 
which can assist the user in a dialogue with a computer- 
implemented application program. The language acquisition 
system employs a means for storing a dialogue context, a 
means for defining a dialogue model and a means for 

30 defining a syntactic-semantic grammar. The dialogue con- 
text is used to maintain a record of what commands have 
been previously entered. The dialogue model describes the 
structure of the dialogue itself. Thus, far example, a dialogue 
structure might include the ability to negate a prior 

35 command, or to provide a direct answer to a question, or to 
refer to a prior command or prior dialogue. Finally, the 
syntactic- semantic grammar defines the actual language 
which the application program can use. The use of a unique 
semantic representation formalism renders the language 

40 acquisition system independent of any particular application 
program language. This is an important advantage, since the 
language acquisition system of the invention can be supplied 
with data representing any desired application language. The 
data representing the application language is included in the 

45 syntactic-semantic grammar. Scripts describing dialogue 
scenarios . are used to parse semantic expressions. 

Another advantage of the unique semantic representation 
is that it readily supports multi-modal communication. The 
present system is able to handle both speech and text input 

50 Other modes of communication can also readily be added. 
The dialogue context, dialogue model and the syntactic- 
semantic grammar work together to provide the user with 
commands which syntactically and semantically correct and 
which can be interpreted by the dialogue server in the 

55 current context 

The dialogue system 20 may be seen as an intelligent front 
end to the target application 22 for language acquisition. The 
dialogue system supplies the actual execution commands to 
the target application. Although the user has the experience 

60 of communicating directly with the target application, that 
communication is, in fact, monitored by and assisted by the 
dialogue system 20. In addition, the dialogue system con- 
tinually dynamically builds tree-structured data representing 
what the application can do in each context The data are 

65 available to the user, to assist the user in acquiring an 
understanding of the target application's capabilities and the 
target application's instruction set and language. In some 
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instances, it may be desirable to allow the target application of wards which can follow the word "no." This is illustrated 

to configure the manner in which the input/output manager in the speech completion window 102b. For purposes of this 

26 functions. The input/output manager 26 is therefore • example, assume that the user selects the word "alpha" from 

provided with an input/output (I/O) configurer 81 which is the list of 102*. The dialogue system then a u to m atically 

coupled to the communication driver 76 and me input s displays the list shown in speech completion window 102c. 

processor 44. B supplies configuration or setup requests to Note that the word "aJAa" ^now appears on me "mound 

the recognizers aTneeded. In the case of the touch screen hne foUowmg the word "no (reference nuinml .106). Note 

deviceTconfiguration requests concerns the definition of *f °" of the p^siblechmces ftomfte list of window 102* 

-„7L_^ * «1j „„ OT ,hV „t 11L . B th, ™,„ is [Enter]. In mis case, the sentence "no alpha" makes sense 

regions— size, position and semantic vaUie-on the screen 1 semantic leveL Thus me user can execute 

uAtb^w^^^J^^^^t^^A^ to ^ & command b eQtcrin ^ ^ ^ 

recognizer to interpret the X-Y coordinates received from ^ ^ exampkiTwill be assumed that 

the touch screen anver the user selects the ward "in" from the list of screen 102c. 

More specifically, the input configurer 81 aUows fte 

dialogue manager ox me^et apphca^to con^ *e ^T^Utinues as <S>ed above until a complete, 

different recognizers according to different needs and/or is d Mmanticany <xmea command hasbeen 

requirements. In the case of the touch screen r^gnizcr for ^^^^ogue window 100 shows a history 

example, me input configurer 81 will receive the configu- commands which have been entered. Thus,if the 

rataon request from the application Oat wdl create move or enters me ^ , no ^ m blne - me 

delete virtual reg,ons on the screen to which speckle mean- wfl? appear as the last entry in dialogue window 

uigs are attached. In the case of the microphonerecogmzer, 20 ToW "Sys: what else?", 

tte mput configurer will receive knguage constraint from ^ ^ ^ ^^^^ 

mc «l»a>og»ejmanager after each dialogue turn, in order to ^ which are used to provide language 

mcr^se speed and accuracy of the recognizer. The con- t<> ^ ^ mQ j Mg^E ^ j£, & 

staunts hmit me search space to what makes sense in the modules m me i^ed^ me Text Language 

current dialogue contexL 23 Acquisition mechanism. FIG. 2 highlights in bold lines the 

To summarize, the dialogue system of the invention . ... , . . - y- u- 

lv Muuiuoiut, «-«wisu6 »jrov«ii « t modules in the architecture involved in the Speech Lan- 

provides a contextual language acquisition mechanism . . . „ . A .„ .._ 

... . „ . ., , ,. , . ^ . cuaee Acouisition mechanism. Note that the wimn duYerence 

which allows the user to build valid inputs interactively. The ? ~f ms»iau»ui. 

. . ... . . , , . . between the two modes comes from the fact mat the MIC 

kuigUMje acquisition ir*chanism corresponds to a mode m Keyboard driver is application 

the dialogue system which can be turned on and off at the 30 ~r" . 7 vr_. , .u ,r . ;S -r_Jl 

, . * ^ ' _ „ . dependent Note also that m the current implementation Text 

user s imaat.ve.The system allows new users to operate an Speech modes function separate^Eventually both 

interlace ana to acquire me application language progres- -^VoulAbc ^ yated at the sair^ time. In that case two 

sively. In this mode, speech and text inputs can be edited and . ^ . . . 

entered, worf^or^character by character in the case c ° m P 1 ^ 0n winoows wouldappear cnae^Ucauon dis- 

of texTinput At each step the dialog^ system provides the 35 P U * T ° °P" ate M ™* me ^ d fi° mc 

next possible word chote or choice! to complete the initial mouse f « <*<**>8 on tt» entry |and not typmg 

innutL — can be used. To operate in speech mode the microphone 

Completion is contextual. Only valid inputs, that is inputs « me touch s^een for instance-by pointing on the entry 

which make sense in the dialogue context, arc generated. !? d ™* fF**^ f^?^ t 

VaUd inputs are inputs mat are syntactically and semanti- 40 "elJialogue System loollat 

cany correct They^re predicted by the dialogue manager jT embodunent * 0Vldzs 8 ^ 
from the irnmediate contcxt-appUcation con^t-^ndthe WMch me aPP^cafron programmer can use to mcorp^the 
dialogue history according toAedialogue model (cf. meta- a ^ mon <f present invention 
language). The semaiufcconstraints given by the dialogue ^cabon rnograjiL^inplete descnption of each 
uu^uog^j. kuusuamu Y° module of the presently preferred language acquisition tool- 
manager are used to generate the syntactic structures from 45 , . ^ , - *7 . ' *^ _ * * T 1 . .... 
mea^lication language. The text and speech completion fat attached at the end of this 
modes can be tumed^n and off at any £dT specification. The foUowmg descnption mato reference to 
HG. 3 shows a typical language acquisition sentence. In ^ tootot modu^s, which are aU identified by a name 
*- ~IZ^^ a?J^„J^ beginning with fee letter Dt ... The interested reader may 
fte example ^ dialogue window 10% displays the dialogue ^^"^^ ^ modulcs m m ; 

between the dialogue system (Sys and the user (Usx) to so ^^^^^ 

addition, a series of speech completion windows 102a-102<i ^ 8 ^ 

have been illustrated to show how the dialogue system might OPERATIONS PERFORMED WHILE IN TEXT 

provide information to the user about what the target appli- MODE 

cation can do in the current context When the user activates Specification of the Text Mode 

the language acquisition mode (if, for example, the user does 55 The Text Language Acquisition mode allows users to get 
not know what to do for the next roinmand), the dialogue assistance on the next possible word or words when they 
system will present a list of words or "completion list" type on the keyboard interactively. It is then possible to build 
which corresponds to the beginning words of aU valid inputs legal commands word by word. Legal commands are corn- 
that are syntactically and semantically correct This is illus- mands that are syntactically correct and semantically inter- 
trated in the speech completion window 102a. For purposes 60 pretable in the current context The mechanism can be used 
of this example, assume that the user selects the word "no" to learn the application language or to check if the input 
from the list of l#2a. This selection may be by selecting it buffer is correct or not Typically users are shown a window 
with a mouse or touch screen or by saying the word "no" into that contains the next possible words — or possible ends of 
the microphone. The word **no" now appears on the com- words if he or she is in the middle of a word. The entries are 
mand line at the bottom of the screen as at 104 in FIG. 1026. 65 listed in order of plausibility. The process continues until a 
Having entered the first ward ("no") the dialogue system valid command is complete. At that point the command can 
displays a second list of wards, showing all possible choices be executed and the process restarts until the mode is 
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deactivated FIG. 4 shows a typical example Because the of a list of semantic masks and is used to parse incoming 

keyboard is an application dependent device, applications messages. Doing so, the central dialogue processor can . 

are required to handle the protocol closely with the server. derive the direct answers, all the possible corrections and so 

Activation of the Text Mode forth. Dialogue meta-command masks are simply added to 

To enter the Text Language Acquisition mode, applica- 5 the list of masks. Plausibilities are attached when the search 

tions will typically provide a button on the display or a is done. The masks obtained from the list will be then used 

keyboard hot key that the user will press when needed. to build the dialogue and application trees. Note that the 

When pressed the application will send a request to the application tree describes the structures that answer the 

dialogue server (class immediate dialogue context in the application domain. The 

DtApplicationKbdCompletionSes sionRcq) to ask it to 10 dialogue tree contains the rest, 

engage the text completion mode. Hie server will reply to Requests for Completion Pages 

acknowledge. Once the text completion mode is engaged, the applica- 

Computation of Constraint Tree Structures tion is allowed to ask the server what are the next possible 

After the text mode has be activated, the dialogue man- word candidates that can be typed to complete the initial 

agex will first build two constraint trees that will represent all 15 buffer. Most often the text completion mode will be started 

the possible syntactic structures that make sense in the with no leading characters, in which case the application will 

current dialogue context The first tree represents all the get all the possible "first" words which start valid sentences, 

meta- language expressions that can be used in the dialogue Because the number of words that can be obtained is 

context and in the dialogue domain. The second tree repre- variable, the request does not concern the entire list, but 

sent all the expressions of the immediate dialogue context 20 rather part of the list — or completion page of fixed size, 

mat can be interpreted in the application domain. Therefore several requests (class 

The central dialogue processor 60 builds the trees by DtApr^cationKbdCoinpletioiiPagcReq) may be necessary 

applying constraints or "semantic masks** on the system and to load the entire list Note that it is the application that 

task grammars. The constraints are extracted from the dia- initiates the communication for completion pages. When 

logue history and the immediate dialogue context according 25 receiving the request for the first completion page, the server 

to the dialogue model. The dialogue model indicates the will compute the list, given the two constraint trees and 

structures mat can be used while the long-term and short- given the initial input buffer sent by the application as 

term contexts provide the possible values. FIG. 5 shows the parameter, 

structures that are manipulated by the dialogue system and Display of the list 

some examples. In FIG. 5 an example telephone answering 30 Once all the possible completions have been acquired 

machine application is used to illustrate the principles of the from the server, the application is expected to display the list 

dialogue model. In the figure the number preceding each so that the user can consult it The words should be displayed 

dialogue structure indicates the plausibility level that is by plausibility order. Note that this is possible since a 

used. A plausibility level of one means that the structure is plausibility score is attached to each word sent by the server, 

very much likely to be used. A plausibility of five means mat 35 Typically the list is displayed as a pop-up window next to the 

expressions generated by the structure are much less prob- keyboard input buffer, 

able. The plausibility level is used to derive the final Content of the list 

plausibilities for each word. Plausibilities are essential to The list contains the possible next words or ends of words 

sort candidate expressions. Expressions derived from plau- if a word is partially typed in the input buffer. It may also 

sibility level one typically answer the immediate context and 40 contain three special words which are: ( 1) [Enter] to indicate 

will therefore be on the top of list shown to the user. that the string of characters typed represents a sentence that 

To set constraints on the grammars and generate incre- is complete and interpretable; (2) [Space] to indicate the 

mentally the trees, the function SetConstraintSemantic- separator between words; (3) [Backspace] to go back and 

Completion is used. It has four arguments: (1) a semantic erase the last word typed It is men possible to navigate 

mask; (2) a pointer to the tree to update; (3) a pointer to the 45 within the structure, visualize the possible sentences and 

system grammar, and (4) a pointer to the task grammar. commands available and execute them. 

Semantic masks are generative expressions based on the Interaction 

semantic language representation used by the server that Typically after each character typed on the keyboard, a 

contain eventually one or several wildcards — character $ — new set of completion pages will be sent to the server to 

The wildcard is a special character that can be substituted by 50 acquire the new list and update the completion window, 

any string. It allows the generation of different structures Another implementation could be to update at the word 

with a single mask. For instance to generate all the color level Note that when an illegal character is typed the list will 

names contained in a grammar, the mask Colar=$0 can be be empty. 

used and will generate the structures Color=RedO, Colors Validation of an Input While in Text Completion Mode 

YellowO, Color=BlueO, etc, 35 To validate an input either the carriage return key can be 

Hie central dialogue processor will systematically take used or the entry [Enter] in the completion list can be 

one after the other the top-level structures defined above selected with the mouse for example. The input string is then 

and, exploring the short-term and long-term contexts, will sent to the dialogue server the way it is sent in normal mode, 

build semantic masks that will be passed to the SetCon- The server will then start its interpretation phase and run the 

stramtSeinanticCompletion function. More specifically the 60 appropriate application scripts until an input is needed. At 

central dialogue processor according to the dialogue model that point the server wOl check if the Text completion mode 

will search out in the »mm*-H;gti» dialogue context and the is still active. If the text completion mode is still active, the 

dialogue history the parsing nodes mat correspond to the server will build the two constraint trees relative to the new 

command being processed, or the last command if no dialogue context and signal the application that the dialogue 

command is currently being parsed It wOl use a parsing 65 context has rhangrA On the application side, when receiv- 

tahle attached to the nodes to get all the possible choices mat ing the notification of new dialogue context, the application 

are parsable within each node. A parsing table is composed will also check to see if the text completion mode is active, 
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If it is active, the application will request the completion 
pages as mentioned before and finally display the list in the 
pop-up window. 

Deactivation of the Text Completion Mode 

lb terminate the text completion mode the application 
will send a deactivation message to the server. At that point 
the server wOl not accept completion pages and it will no 
longer build the constraint trees. Applications will also 
remove the completion window from the screen. 
Flowchart 

FIG. 6 represents graphically the operations that typically 
take place in the text mode. In FIG. 6 the dialogue server and 
application program are represented by two vertically 
arranged columns 200 and 202. respectively. Between those 
two columns is the connection column 204 which sets forth 
the toolkit module used to perform the given connection. 

OPERATIONS PERFORMED WHILE IN 
SPEECH MODE 
Specification of the Speech Mode 

The Speech Language Acquisition mode allows users to 
get assistance on the next possible words when they use 
fragmented speech input with the microphone interactively. 
It is then possible to build legal commands word by word. 
Legal commands are commands that are syntactically cor- 
rect and semantically interpretable in the current context 
The mechanism can be used to learn the application lan- 
guage or overcome possible recognition errors while speak- 
ing in standard mode. Topically users are shown a window 
that contains the next possible words. The entries are listed 
by plausibility order. The process continues until a valid 
command is complete. At that point it can be executed and 
the process restarts until the mode is deactivated. FIG. 7 
shows a typical example. 

Because the microphone is an application independent 
device, the protocol is simplified since the dialogue server 
internally handles the speech input buffer. Applications are 
only requested to display the list of words after each 
utterance. 

Activation of the Speech Mode 

To enter the Speech Language Acquisition mode, appli- 
cations will also provide a button on the display or a 
keyboard hot key that the user will press when needed. 
When pressed the application will send a request to the 
dialogue server (class 

DtApplicationMicCompletionSessionReq) to ask it to 
engage the speech completion mode. The serve will reply to 
acknowledge. 

Computation of Constraint Tree Structures 

After the speech mode has be activated, the dialogue 
manager will first build two constraint trees that will repre- 
sent all the possible syntactic structures that make sense in 
the current dialogue context The procedure is similar to the 
text mode except that speech grammars are used instead of 
text grammars. 

Requests for Completion Pages 

Once the speech completion mode is engaged, the appli- 
cation is allowed to ask the server what are the next possible 
word candidates that can be uttered to complete the initial 
buffer. Note that when engaged the speech buffer will be 
empty. Because the number of words that can be obtained is 
variable, the request does not again concern the entire list 
but rather only part of the list— or completion page of fixed 
size. Therefore several requests (class 
DtApplicationMicCornrietio may be necessary 

to load the entire list Note that it is the application that 
initintr the coinm" 11 ! " < * 1 * ft n for completion pages. 



,841 

20 

Display of the List 

Once all the completion have been acquired from the 
server, the application is expected to display the list so that 
the user can consult it The words should be displayed in 

5 plausibility order. Note that this is possible since a plausi- 
bility score is attached to each word sent by the server. 
Topically the list is displayed as a pop-up window next to the 
keyboard input buffer. 
Content of the List 

10 The list contains the possible next wards ar ends of words 
if a word has been already partially typed in the input buffer. 
It may also contain two special words which are: (1) [Enter] 
to indicate that the string of character that has been typed 
represents a sentence that is complete and interpretable; (2) 

15 [Backspace] to go back and erase the last word typed. It is 
then possible to navigate within the structure by voice, 
visualize the possible sentences and commands available 
and execute them. 
Interaction 

20 In speech mode users may pronounce single words or 
sequences of words. Typically after each fragment pro- 
nounced by the user, the server will automatically send the 
first completion page to signal the application that a new list 
should be displayed. At mat point the application will 

25 request the remaining pages from the server. When recog- 
nition errors occur or illegal words are uttered, no progres- 
sion is made in the structure and a warning message is sent 
to the user. 

Validation of an Input While in Speech Completion Mode 

30 To validate an input either the word "Enter" can be said 
or the entry [Enter] in the completion list can be selected 
with the mouse. The input buffer maintained by the Input 
processor is then tested against the speech grammars to 
derive its semantic meaning. The server will then start its 

35 interpretation phase and run the appropriate application 
scripts until new will be requested from the user. At that 
point the server will check if the speech completion mode is 
still active. If it is still active the server will build the two 
constraint trees relative to the new dialogue context and 

40 signal the application that the dialogue context has changed. 
On the application side, when receiving the notification of 
new dialogue context, the application will clear the currently 
displayed list and be prepared to receive the first completion 
page from the server relative to the new expressions that can 

45 be said in the new context 

Deactivation of the Speech Completion Mode 

To terminate the speech completion mode applications 
will send a deactivation message to the server. At that point 
the server will not sent the initial completion page. On the 

50 application side applications will remove the completion 
window. 
Flowchart 

FIG. 8 represents graphically the operations that typically 
take place in the speech mode. FIG. 8 is arranged in vertical 

55 columns, similar to mat of FIG. 6. 
Motivation for a Real-Time Processing 

Id order to be efficient, usable and therefore accepted by 
users, a real-time interaction between the server and the 
application is a key factor. This is the reason why constraint 

60 trees are built after each new dialogue context These trees 
are intermediate data structures that the multitude of 
completion requests that follow work upon. By building 
these trees as intermediate data structures, no slow down can 
be perceived by the user. 

65 FIGS. 9 and 10 illustrate how constraints are added to the 
constraint tree and how the completion list is obtained from 
the constraint tree. More specifically, FIG. 9 presents an 
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example of a syntactic-semantic grammar for a very simple manager provides the values, which are for most of mem 

language. As illustrated, the syntactic-semantic grammar application dependent Given a context the generated masks 

comprises a set of rules, which in the example have been will represent (1) all the possible direct answers, (2) all the 

designated with rule numbers #0-#14. The rules are used to over-informative answers, (3) all the elliptical answers, (4) 

define different attributes and they may therefore be grouped 5 all the corrective answers, and (5) all the built-in dialogue 

accordingly. For example, group 150 comprising rules meta-commands. 

#0~#3 define the attribute Color The default field is used The immediate dialogue context is used to derive form 

when a group could not be recognized because of recogni- expressions of plausibility level 1 (form #1). The dialogue 

tion errors or omissions. The default field indicates the history is used to derive forms of probability levels 2-4 

default semantic to be used. In the case of the Color group to (forms #2~4). Built-in dialogue meta-commands (level 5 or 

its value is 4 Colar=?0'.The ? marker indicates that the color form #5) are application independent To summarize the 

name could not be recognized. The dialogue manager will dialogue model is a tool that allows the dialogue server to 

most certainly ask specific questions on the unspecified predict the user's inputs. 

value later on in the dialogue. Similarly, group 152 defines Application dependent information is provided via scripts 
the attribute Name. Group 154 defines the attribute Item; 15 and more precisely in the parsing nodes — or Read ins true- 
group 156 defines the attribute NounPhrase. Note that the tions. They are used to parse inputs from the user. Each node 
rules which define NounPhrase define phrases which in turn will describe all the parsing alternatives — given in a parsing 
use other attributes, such as the attributes Color, Item and table — that are allowed and each possible path will point to 
Name. Finally, group 158 defines the attribute TopLeveL a sub-parsing node or just end. By examining the possible 
TheTopLevel attribute has three rules, #12 and #13, rule #14 20 alternatives the user could have used, the dialogue manager 
representing what can be said. is able to construct all the masks of forms #2, #3 and #4. 

Each of the rules comprising the grammar includes a Typically many masks will be generated, since corrective 

description of the syntax, shown in the region designated input can operate on any piece of information given in the 

160 and a corresponding semantic expression shown in the last command, for instance. 

region designated 162. The syntactic information of region 25 Form #1 inputs are simply derived from the current 

160 shows how each of the defined members of a given parsing table and form #5 inputs are fixed and for most of 

attribute are spelled. In the case of a NounPhrase, the them, always available (e.g. cancel, repeat and help), 

"spelling" includes the proper whitespace locations as well nG u gj ves detailed information on the compu- 

as the order of attributes which define the phrase. The of foms #lj #2, #3 and #4. In FIG. 11, possible 

semantic expressions of region 162, corresponding to each 30 answcTS include: "Pause," 'Ttaise the volume " "no 6," 4< no 

of the syntactic expressions of region 160, are expressed tock 6 » ^ play tr2Li± € » ^ « n0 raise me volume." 

using a nomenclature which is independent of language The toCTcmcn tal construction of trees and requests made on trees 

grammar defined in this way represents a static knowledge „_ . ^ . A . .? , A , 

of all possible expressions to which the target am>Hcation pc masks that are generated are apphed on the granimars 

can respond ^ p 35 with the function SerConstramtSenianticComplerion. FIG. 

FIG. 10 illustrates how a constraint tree is built using the 10 ^ OWS ^ P"** 88 - 

grammar of FIG. 9 for the semantic expression "Item=$ ^ trecs m constructed only once, after each new 

(Attributes[Color=$()])." m building the tree semantic dialogue context, but are used several times to build the 

masks are built from the dialogue context. One example of different lists of next possible words in the text and speech 

a semantic mask is illustrated in FIG. 11. Semantic masks « com P ledon modes - M ^ mc completion algorithm 

are partially contained in the parsing table and also built will use the grammars and the constraint trees to generate the 

from the dialogue history, as shown in FIG. 11 for the «■? °{ words m . e lc ?**& strings. ™* ^ 

specific example. This semantic expression represents a ^ 15 a< * cornputationally expensive and aDows a real- 

mask which can match several syntactic expressions of the ""f mteraction between the dialog server and the appli- 

grammar (eg. the red cube, the black sphere, etc.). The 45 03X1011 m of displaying the lists to the user, 

presently preferred implementation utilizes the semantic Summary 

expression mask by following the procedure illustrated in From the foregoing it will be seen (hat the present 
PIG. 10. invention provides a highly useful multimodal dialogue 
Building the Constraint Trees environment and user interface to assist a user in acquiring 
Constraint trees are built to serve as intermediate struc- 50 ue language of an application or computer program using 
tures that will speed up mc process and the mteraction while text and speech input The system allows users unfamiliar 
in text or speech completion mode. They describe all the with the language or available commands of an application 
possible inputs the user can that sense in the or computer program to progressively build sentences which 
current context The trees reflect syntactic constraints on the will having meaning to the application or computer pro- 
grammars that restrict the possible sentences that can be 55 gram. 

generated. To restrict or constrain the generation, semantic Although the invention has been described in connection 

masks are applied on the grammars. The function SctCon- with a general purpose computer application, it will be 

straintSemanticCompletion provided is used to incremen- understood that invention does not depend on the nature of 

tally generate the constraint trees. The masks are computed the application program. As such, the application program 

by the central dialogue processor based on the immediate 60 could be embedded in an electronic device, such as a VCR, 

dialogue context and the dialogue history according to me telephone, multimedia system or other consumer electronic 

dialogue model product 

Dialogue Model While the invention has been described in connection 
The Dialogue model describes the expressions that can be with the presently preferred embodiment, it will be under- 
understood by the dialogue manager in terms of structure. 65 stood that the invention is capable of modification without 
See FIG. 5. If the dialogue model describes the logical form departing from the spirit of the invention as set forth in the 
that can be used, the dialogue context stored in the dialogue appended claims. 
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APPENDIX The maximum number of line for each page is limited to 

The iyiTOalogueKbdCompletionSessionReq Request DtCompletionPagcSize. If the completion is larger than 

The DtDialo gueKbdCompUtionSessionReq request is a DtCompletionPagcSize elements, more than one page will 

request by which the server notifies the application whether be sent (See DtDialogueCompletionPageReq request), 

the text completion is turned On or Off. Note that it is the 5 Typically, applications are expected to display a pop-up 

dialogue server that turns the completion One and Off after window showing the possible words, 

a request form the user. To set the text completion mode On The IXGetCompletionPageDesciiption Function 

or Off, applications send a DtApplicationKbdComple- This function allows application to get global information 

tionSessionReq Request to the server. Applications are about the page, 

encouraged to signal the completion state. 10 Synopsis: 

void IXGetCoiin^ktionPageDescripdon(CplPage I TotalPage, 

— — — . TotalCount4*age,Cou nt) where: 

tfP** 6 ' stiuct DtCompletionPage *CplPage: pointer to the structure to 

ini rw access, 

Boo km Sate; 15 int *TotalPage: integer indicating the total number of 

} ppUtoffljeKeybo«iri^^ pages for the current completion request from the 

application, 

where: int ♦TotalCount integer representing the total number of 

Boolean State: Boolean indicating whether the comple- ^ entries for the current completion request, 

tion will be turned On or Off. int *Page: index of the current page, 

To reply to a DtDiahgueK^^mpletwn^sionReq ^ Qumber ^ eQtries to mc 

request applications must send a I^UcanonKMComple- Notc ^ me Sffver ^ scnd the pages in ascending 

UonSessionAns class message to the server for acknowledg- ^ m ^ ^ t0 ^ rccdved ^ £ vc a numbcr 

mcnt 25 equal to TotalPage, 



The DtGeUtemCompletionPage Function 



typc&f gtroc, This function allows applications to get the information 

{ relative to a page entry, 

int Class; Synopsis: 

} ifrAjylkafrttKeyboa^^ M yoid r^GetltemCoinplctioiu^agcKC^lPagc^Tcxt) where: 

* DtCompletionPage *CplPage: Pointer to the structure to 
The DtDialogueMicConrnktionSessionReq Request access, 

The DtDialogueMicCompUtionSessionReq request is a * mtPtr: j^dcx of the entry to access in the page, 
request by which the server notifies the application whether . 4 _ ^ ... _^ A 

the speech completion is turned On or Off/Note that it is the 35 ^ 7f U ^ "*™ tnhD & onc next word 

dialogue server that turns the completion On and Off after a (or end or war ). 

request from theuser. To set the speech completion mode On Note ^ stnn * " ^ mcd: 

or Off, applications send a DtApplicationMicComplc- * <s P ace> to indicate a space character, 
tionSessionReq Request to the server. Applications are <3teturn> to indicate that the current buffer is a valid 
encouraged to signal the completion state. 40 command ready to be sent 

It is important to note mat the appticationrnes sage DtDia- 
logueDialogucContcxtReq after each change in the dialogue 



typodef stmct context This can be used to ask for new pages when the 



{ 

intChas; 

Booteu Stxtc; ^5 
} DtDisJogQeMk^hoDeCcnyWrtgnSeaa' 



completion mode is on. Example: 



{ 



Where: i^GctdialogirCoiiipfctioiiStateO)- 

Boolean State: Boolean indicating whether the speech ^ A|plMs*Cia«=DtAp^ 

completion will be turned On or Off. DtPutiHaiogueAnswei<(*Ap^ 

To reply to a DtdialogueMicCompletionSessionReq — ^— - — — — — ^— _ 

request applications DtApplicationMicComple- The DtDialogueKbdCompletionPageReq Request 

nonSess.cnAns class message to the server for acknowledg- ^ DtDt^g^KbdCompUtiZpageRcq request is a 

menta 55 request by which the server notifies theplication that a 

_ ^_ completion page is ready for display. When typing on the 

typedef gtruct keyboard, completion requests (see DtApplicationKbd- 

{ CompletionPageReq request) might be sent to the server and 

im Ota*: examining the dialogue context the server will send back a 

> DtAglkitpoMicr^^ ^ Ugt of pagC8 (i c a Ust of otDialogueKbdCompie- 

"™ ^ ^ - rwnPa^e/fe^ requests) containing the next valid words. A 

Hie DtCompletionPage Structure DtCompletionPage structure is used to represent the page. 

The DtCompletionPage structure is used to transfer This is an opaque structure that has been presented earlier, 

completion pages to applications mat made a request for Note that the complete completion list is obtained after all 

keyboard completion (see DtApplicaxion CompUtionRe- 65 the pages have been received. The completion mechanism is 

questReq request). Each page contains the possible wards contextual and helps users learn the command language used 

(or end of words) that complete the current keyboard buffet to interact with applications. 
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typcdef struct 
{ 

iot Class; 

DtContplettociPage Page; 



where: 

DtCompletionPage Page: Page to be displayed that is part 
of the completion list 

To reply to a DtDialogueKbdCoTjipletionPageReq request 
applications must send a DtApplicationKbdCompletionPa- 
gcAns class message to the server for acknowledgment 



typcdef struct 

{ 

int Cl a w ; 

} IXApplkfltiooKcyboaidComp ietkraPageAiiswer 



The DtDialogueMicCompleuonPagcReq 

The DtDialogueMicCompletionPageReq request is a 
request by which the server notifies the application that a 
completion page is ready for display. When giving input 
voice commands, completion requests (see DtApplication- 
MicCompletionPageReq request) might be sent to the server 
and examining the dialogue context the server will send 
back a list of pages (Le. a list of DtDialogueMicComple- 
tionPageReq requests) containing the next valid words. A 
DtCompletionPage structure is used to represent the page. 
Note that the complete completion list is obtained after all 
the pages have been received. The completion mechanism is 
contextual and helps users learn the command language used 
to interact with applications. For the microphone completion 
mode, the application is notified by the server of a change in 
the dialogue context by a DtDialogueDialogueContextReq. 
When this request is received the application should check 
if the microphone completion mode is on, and if so, request 
completion pages. 

Due to the properties inherent to speech, the speech 
completion mode has different features as compared to the 
text completion mode. In particular, in the speech comple- 
tion mode the user can give several words at a time or the 
complete sentence, if he wants. In other words, the speech 
completion mode use fragments instead of words. 
Furthermore, in the text completion mode the completion 
buffer is handled by the application but in the speech 
completion mode it is handled by the server. 



typcdef struct 
{ 

fait Class; 

T> rfViTT|pWwiPi^g fi Page; 
} DtDkbgucMkrophcpeCapip^ 



where: 

DtCompletionPage Page: Page to be displayed that is part 
of the completion list 

To reply to a DtDialogueMicCompletionPageReq request 
applications must, send a DtAppUcanonMicComplenonPa- 
geAns class message to the server for acknowledgment 



typsdef struct 

i . 
iot Class; 
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The DtApplicationKbdCompletionPageReq Request 

The DtApplicationKbdCompUtionPageReq request is a 
request for a text completion. Regarding the text completion, 
when typing on the keyboard, applications might send a 

5 completion request corresponding to the current keyboard 
buffer. Because the list of choices can be very large, the 
completion mechanism is based on the notion of completion 
pages and the DtApplicationKbdCompUtionPageReq 

1Q request is more precisely a request for a completion page. 
Each page contains the possible words (or end of words) 
completing the initial siring. Then for each request the 
server will send back a completion page (see DtDialogueK- 
bdCompletionPageReq request). Given a dialogue context, 

15 the first request should concern the page number 1. Infor- 
mation on the total size of the list is provided on each page 
returned by the server. 



„ typcdef struct 

{ 

int Class; 
int Page; 

DtDialogueString Text; 
} DtAroticatkaKeyboeitiCar^ 
25 ~" — — — — — — _ 

where: 

* int Page: Page number to load, 

* DtDialogueString Text: String to complete. 

30 To reply to a DtApplicationKbdCompletionPageReq 
request the server will compute the completion (if needed), 
extract the desired page and send a DtDiclogueKbdComple- 
t'wnPageAns class message to the application. 



typcdef struct 

{ 

mt Class; 

} DuOiakrgucKcyboazxK^oii^ktiocP^AiKwcr 



40 

Follows an example of function which send a DtAppUca- 
tionKbdCompletionPagcReq to the server. Example: 



void ScT¥ff>ifllognrKgyboardCon3pktk>nPagcR 
String Buffer, 
be Page; 

{ 

IXAppbcatkmMessage AppIMsg; 
DtDxalogueMe&sage DlgMag; 

strcpy(ApplMsg,Kcybo«ixK2ony Drt- 
ValueJBuffex); 

AppIMsgJ&eyboaidCompkSkj^^ 
DtPutDialc^Rcquest((*Appl)^ 
DtGetDialogueAi)6wei((*Ap^ 

> 



The IXAppUcationMicComplctionPageReq Request 

The DtApplicationMicCompUtionPageReq request is a 
request for a speech completion. Regarding the speech 

60 completion, if the speech completion mode is on, applica- 
tions might send a completion request after each change of 
the dialogue context For each request, the server will send 
back a completion page (see DtDialogueMic Comple- 
twnPageJteq request). Given a dialogue context, the first 

65 request should concern the page number 1. Information on 
the total size of the list is provided on each page returned by 
the server: 
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typedef struct 
{ 

im Class; 
int P*$e; 

DlDsabgueString Text; 
}DtAw!i 



typedef struct 
inl Class; 



c Request; 



where: 

* int Page: Page number to load, 

* DtDialogucString Text: String to complete. 

To reply to a DtApplicationMicCompletionPagcRcq 
request the server will compute the completion (if needed), 
extract the desired page and send a DtDialogueMicComplc- 
tionPagcAns class message to the application. 



10 



} DtApplkatkMiMkiopfanntr^ 

where: 

Boolean State: Boolean indicating of the speech comple- 
tion mode should be turned on or off. 

To reply to a DtAppUcationM icCompletionSessionRcq 
request the server sends a DtDialogueMicCompUtionSes- 
sionAns class message to the application. 



15 



typedef struct 
int Class; 



typedef struct 

int Class; 
} DtDJsJos^ieMkrophDiieC^ 



20 



The DtAppUcadonKbdCornpletionSessionReq Request 

The DtAppUcadonKbdCompletionSessionReq is a request 
from the application to ask the server to set the text comple- 
tion mode On or Off. 



25 



typedef struct 
{ 

mt Class; 
Boolean State; 
} DtAfplfcatkiiKeybosffdCoiXB^^ 
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35 



where: 

Boolean State: Boolean indicating whether the text 
completion mode should be turned on or off. 

To reply to a DtAppUcationKbdCompletionScss'wnReq 
request the server sends a DtDialogucKbdCompletionScs- 
sionAns class message to the application. 



40 



typedef struct 
{ 

ictQass: 
} DtDiabgueKeybo«nK>iiiip]eti 



□SeeaionAnswer; 



Follows an example of function which sends a DtAppli- 
cationKbdCompUtionSessionReq to the server. Example: 



t<State) 



void I 
Boolean State; 
{ 

DtApplicationMessagD AppIMsg; 
DtDtilny r Mftsngf DlgMtg; 
AppJM*g£lBa*=£tAppticatfcoKb^^ 
Ap p I M ig J^ bos^outp V liiiLSeMkmRequ ett Stfltr = S tatc; 
Dti?utDialc^ R e q Qt ii t((*Ap^ 



} 



The I^Ar^licationMkcornpletionsessionReq Request 

The DtApplicationMicComplctionSessioTiReq is a request 
from the application to ask the server to set the speech 
completion mode On or Off. 



What is claimed is: 

1. A supervised contextual restricted natural language 
acquisition system for computerized applications, compris- 
ing: 

first means for defining and storing a dialogue history 
context; 

second means for defining and storing a dialogue model; 
third means, for defining and storing at least one syntactic- 
semantic grammar; 
fourth means responsive to said first, second and third 
means for building a language to individually represent 
both dialogue-specific expressions and application spe- 
cific application expressions and means to represent 
said application-specific expressions as at least one 
prediction tree representing at least one possible dia- 
logue that is semantically consistent with the stored 
dialogue history context and stored dialogue model and 
fifth means to interpret said dialogue-specific expres- 
sions to supply instructions to the language acquisition 
system; and 

user interface means far providing language assistance to 
the user based on said prediction tree. 

2. Hie system of claim 1 wherein said third means 
includes means for defining and storing at least one 

45 application-specific grammar and at least one system- 
specific grammar. 

3. The system of claim 1 wherein said second means 
includes means for defining and storing a dialogue model 
which defines the type of interaction in terms of dialogue 

50 answers. 

4. The system of claim 1 wherein said third means 
includes a database of grammar rules arranged as uniquely 
labeled rules. 

5. The system of claim 1 in which the restricted natural 
55 language is used to communicate with a target application 

and wherein said third means includes a database of scripts 
comprising possible dialogue scenarios supported by said a 
target application. 

6. The system of claim 1 in which the restricted natural 
60 language is used to communicate with a target application 

and further rnmpn^ng means for establishing a communi- 
cation channel between a user and said target application. 

7. The system of claim 6 wherein said first means includes 
a mechanism for monitoring the communication channel and 

65 far making a record of the context state of the target 
application during interaction with the application by the 
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8. The system of claim 6 wherein said fourth means has 19. The system of claim 12 wherein said dialogue man- 
mechanism fox monitoring the communication channel and ager includes means fox selectively building an ordered list 
for rebuilding at least one prediction tree automatically as a arranged in order of a predetermined plausibility. 

user interacts with the target application over said commu- 20. The system of claim 12 wherein said application 

nication channeL 5 manager communicates with said dialogue manager through 

9. The system of claim 6 further comprising multi-modal application scripts that describe the application scenario at 
communication support mechanism coupled to said com- mc dialogue level. 

munication channel whereby the user may communicate 2L The system of claim 12 wherein said dialogue man- 

with the application using at least voice and manual input agcr has a specdl modc far handling speech input consisting 

m °^f s ' , , . , . . ^ 10 of words and sequences of words. 

10. The system of claim 1 wherein said first means « * . . t ~ „, kjM ^„ j. lAftllft m9n 

. , . J . . - .. ^_ 22. The system of claim 12 wherein said dialogue man- 
includes a mechanism for recordinc the context state or the . / _,, . c , ... . . . . ... f 

, AW * AVW * 5 ager has a text mode for handling text input consisting of 

knguage acquisihon^stem as a user interacts with it WQrds md ces of words . 

througfrsaid user interface. 23. The system of claim 12 wherein said input/output 

1L The system of claim 1 wherein said user interface ti . - . . T ^ 

ajraicm ui . . . . 15 manager comprises a multimodal interface supporting 

includes mulu-modal input mechanism whereby the user * ;™/ 

. . . . A . , . .V . mouse input, 

may communicate wife the language acqmsmon system 24. ThVsystem of daim 12 wherein said input/output 

using at least voice and manoal inpu modes. manager comprise a multimodal interface supporting touch 

12. An interactive restricted natural language acquisition _ & . 

, , . . .... . . screen input. 

system fox a computerized application comprising: 25. Alanguage acquisition system to assist a user in a 

an application manager coupled to said computerized dia i ogue w ith a computer-implemented application 

appUcation for providing an interface to said comput- program, comprising: 

crized application: . ^ . , . 

rt \. , ' A . . . _ A ^ , input system for supporting at least one of text and speech 

at least one dialogue database containing syntactic and modes of input' 

semantic information regarding a language; « * . 

an input/output manager having at least one generator for a first means for storing a dialogue context; 

generating messages using said dialogue database and a second means for defining a dialogue model which 

having at least one recognizer for processing user input describes the structure of a dialogue; 

using said dialogue database and for extracting seman- a third means for defining at least one syntactic-semantic 

tic information from said user input; 30 grammar; 

a dialogue manager coupled to said application manager fourth means coupled to said first, second and third means 

and to said input/output manager for interpreting said to interactively assist the user in building commands 

semantic information and for selectively issuing com- which to the application program are syntactically and 

mands through said application manager to the com- seman tically correct and which can be interpreted by 

puterized application in response to said semantic 35 the dialogue manager in the current context; 

information; said fourth means for generating assistance differently 

said dialogue manager including means for selectively based on the mode of input supplied to said input 

building first and second tree structures based on cur- system. 

rent and historical user interaction said tree structures 26. The system of claim 25 wherein the fourth means 

representing dialogue-specific and application-specific 40 generates assistance selectively at a character-by-character, 

information respectively whereby the user is provided word-by- word or phrase-by-phrase basis depending on the 

with language acquisition assistance on an interactive mode of input 

basis. 27. The language acquisition system of claim 25 wherein 

13. The system of claim 12 wherein said system includes said third means defines a restricted natural language gram- 
user actuable means fox invoking the building of said tree 45 mar. 

structure, whereby language acquisition assistance is acti- 2& The language acquisition system of claim 25 further 

vated at the user's initiative. comprising means for supplying actions resulting from said 

14. The system of claim 12 wherein said dialogue man- commands to said application program. 

ager includes context handler fox maintaining a first record 29. The language acquisition system of claim 28 further 

of a short-term context and a second record of a long-term so comprising user selectable means for selectively building 

context, said context handler automatically updating said said commands and controlling whether said commands will 

first and second records in response to interaction by the be executed by said application program or not 

user. 30. The language acquisition system of claim 25 wherein 

15. The system of claim 12 wherein said input/output said first, second and third means are interactively coupled 
manager comprises a multimodal interface supporting at 55 together to buQd said commands progressively. 

least text and speech input 31. Hie language acquisition system of claim 30 wherein 

16. The system of claim 12 wherein said input/output said commands are built progressively in accordance with at 
manager comprises a rn illHmnfia1 interface supporting at least one syntactic-semantic grammar such that user errors 
least text and speech output in syntax are automatically detected. 

17. The system of claim 12 wherein said dialogue man- 60 32. The language acquisition system of claim 30 wherein 
ager includes a dialogue processor for defining a dialogue said commands are built progressively in accordance with at 
model which frupu<Mts both rneta-commands and application least one syntactic-semantic grammar such that user errors 
commands used for the interaction. in semantics are automatically detected: 

18. The system of claim 12 wherein said application 33. The language acquisition system of claim 25 wherein 
manager comprises a toolkit of commnnications functions 65 a semantic representation defined in the syntactic-semantic 
for enabling the exchange of information between said grammar renders said language acquisition system indepen- 
compoterized application and said dialogue server. dent of the application program language. 
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34. The language acquisition system of claim 25 wherein 
said third means is constructed to receive data describing at 
least one syntactic-semantic grammar and for establishing a 
set of rules based on said data for building said commands, 
whereby the language acquisition system can be adapted to 
work with a plurality of different languages. 

35. The language acquisition system of claim 25 wherein 
said third means is constructed to receive data describing at 
least one syntactic-semantic grammar and for establishing a 
set of rules based on said data far building said commands, 
whereby the language acquisition system can be adapted to 
work with a plurality modes of communication, including 
text and speech. 
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36. The language acquisition system of claim 25 wherein 
said first, second and third means are configured to define a 
first mechanism which is independent of any application 
program and to define a second mftrhAnigm which is depen- 

5 dent on a particular application program. 

37. The language acquisition system of claim 25 wherein 
said dialogue model supports a plurality of different types of 
dialogues including (a) negation of a prior dialogue; 

io (b) direct answer, and (c) reference to a prior dialogue. 

» * * * * 
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